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ABSTRACT 

School effectiveness indices (SEIs), based on 
regressing test performance onto earlier test performance and a 
socioeconomic status measure, were obtained for eight subject-grade 
combinations from 485 South Carolina elementary schools. The analysis 
involved school means based on longitudinally matched student data. 
Reading and mathematics achievement data were gathered from the South 
Carolina Basic Skills Assessment Program tests, the Comprehensive 
Tests of Basic Skills, and the Cognitive Skills Assessment Battery. 
Grades one through four were included. The resulting SEIs were found 
to be somewhat unstable across subject areas and very unstable across 
grades. Grade-to-grade correlations of the SEIs measuring mathematics 
performance, although small, were largely significant whereas those 
measuring reading performance were generally nonsignificant. This 
suggested that school effects may be more readily discernible in some 
subject areas than in others. Implications were drawn for effective 
schools research and for school incentive award systems based on 
student test performance. (Author/GDC) 
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Abstract 

School •ffttctivenMs indiCM (SEIa) bassd on ragraaalng teat 
parforaanca onto aarliar taat parforaanca and an SES meaaura, ware 
obtalnad for aight aubjact-grada combinations for a large aample of 
alaaantary achoola. Tha analyaaa involvad school aaans baaad on 
longitudinally aatchad studant data. Tha resulting SEIs wara found to 
be soaawhat unatabla acroaa aubjact araaa (reading and aatheaatics) 
and very unatabla acroaa grades (ona through four) • Grade*to*grade 
correlationa of the SEIa aeaauring aatheaatica performance, although 
small, were largely significant whereaa thoae measuring reading 
performance were generally nonsignificant. Thia suggeated that achool 
affects may be more readily diacernible in aome aubject areas than in 
others. Implicationa were drawn for reaearch on effective achoola and 
school incentive award ayatema baaed on atudent teat performance. 
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stability of SEl3 Page 
For a number of years, researchers have been attempting to 
exami'ie how well individual schools have been doing in their efforts 
to foster important educational outcomes in the children who attend 
them. Most frequently this examination has utilized quantitative 
indicators of overall student performance and the focus has been on 
school accountability, ''school effectiveness'*, and the more recent 
efforts to award schools whose students have exhibited exceptional 
achievement. (None of these movements should be con-fused with the 
estimation of whet have been called "school effects" (e.g., Coleman, 
et al.,1966) in which the objective has been to estimate the portion 
of achievement variation which can be attributed to schools in general 
after various background factors have been taken into account.) 

The recent impetus in this area is related to state- and 
district*level programs to monitor school performance on the basis of 
student achievement test data. In some cases, these programs lead to 
recognition of high performing schools and in a few cases, monetary 
awards to the schools and/or their personnel (see, e. g., Wynne, 
1964). At the state level California, Florida, and South Carolina now 
have school award programs in which test scores are a major factor in 
the determination of awardees. District-level programs would include 
the Dallas Independent School District and the Montgomery County (MD) 
Public Schools to name but a few. In some cases district-level 
programs derive from the **effective schools'* literature, the objective 
being to identify and then study high performing schools rather than 
simply to reward them in some way. 

This interest has led to a number of research papers dealing 
with methodological problems in the identification of schools to 
receive recognition based on student achievement. Although these 
papers have rather diverse objectives, they tend to concentrate on 
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conparing the results from utilizing the various methodologies which 
have been proposed. When similar methodologies have been compared 
(e. g., assorted regression approaches) the results tend to be quite 
consistent (e. g., Webster and Olson, 1964; Abalos, Jolly, and 
Johnson. 1985) , but when the procedures vary on major dimensions, the 
reverse is usually true (e.g., Frechtling, 1962; Frederick and 
Clauset, 1965). In these latter situations, researcher's 
recommendations concerning which procedure to use have often been 
based on equity issues such as lack of bias toward low SES, under* 
achieving, or minority children. 

In much of the earlier "effective schools'* research, schools 
were identified based on the performance of a rather limited sample of 
their student body (e.g., students at one grade level for one year) 
and in some cases the performance of these students in only one 
sub3ect area (e.g., reeding) was considered. Although other 
methodological problems received more attention, critics such as 
Powan, Bossart and Dvyer (1963) and Ralph and Fennessey (1963) have 
also taken these researchers to task for the rather limited nature of 
many of these earlier studies. 

In some cases, researchers of **school award** algorithms, have 
also been guilty of limiting the purview of their analyses, but the 
trend is toward computing two or more indices at each grade level and 
then aggregating these indices to the school level. Although the issue 
of how best to conduct this aggregation is beginning to receive some 
attention (Abalos, Jolly, and Johnson, 1965), researchers do not seem 
to be concerned about whether this aggregation is sensible from a 
psychometric standpoint. 

This paper will investigate the consistency of what will be 
termed ** school effectiveness indices** (SEH) as a function of grade 
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lav«l and aubjact matter. Thia iasua la worthy of study for two 
raaaona. Firat, in aoma caaaa, the criticia»a noted above concerning 
limited grade level and aubject matter coverage are atill relevant. 
Second, it addreaaea an aaaumption implicit in aggregation, namely 
that comparable indicea are being aggregated. SEIa will be 
conatructed for the two baaic akilla areaa of reading and mathematica 
for each of gradea one through four for a large sample of elementary 
achoola and the conaiatency of the reaultant indices will be 
considered. Implicationa for the identification of **effective 
achoola** and thoae to receive awarda will be diacuaaed. 

Theoretical and Empirical Background 
Rowan, Boaaart and Dwyer (1983) have identified the following 
four general approachea to the creation of SEIa: (1) the uae of 
abaolute atandarda auch aa achool meana and comparing them to national 
norma, (2) analyzing trenda in teat acorea for a given grade luvel 
over a period of yeara, (3) analyzing trenda in teat acorea for a 
given cohort of atudenta aa they progreaa through a achool, poasibly 
comparing their performance to national normative data, and (4) 
varioua methoda baaed on reaiduala from a regreaaion analyais. As 
noted above, the reaulta from applying varioua aoaewhat distinct 
approachea have been demonatrated to be quite incona latent • The 
family of approachea which appeara to have the moat empirical aupport, 
however, are thoae baaed on regreaaing achievement onto prior 
achievement and aoae meaaure of aocio-econoaic-atatua (SES). For 
theae reaaona, thia wcia the general approach aelected for thia 
inveatigation • 

An early, rather well-known regreaaion -baaed methodology waa 
developed by Dyer and hia colleaguea (Oyer, Linn and Patton, 1969; 
Q Oyer, 1970). Baaically an educational accountability ayatem, in the 
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**studMt change nodal of an educational ayatan" atudant outconea are 
ragraaaad onto atudant inputs and "hard-to-changa" variables, with the 
reaiduala serving aa the SEIa (or aa the baais of what Dyer called 
''performance indicatora** (PI)). The PI metric waa aimply a five point 
scale baaed on the atandardized reaiduala. One important 
characteriatic of Dyer^s system waa the uae of longitudinal data, the 
juatif ication being that, "the only fair index of achool ef f ecti veneaa 
ia one that reata on input-output data obtained only on thoae pupils 
with whom the achool ataff haa been in continuoua contact over a 
specified period of months or years.'* (Dyer, 1970, p. 206). Results 
presented by Hilton and Patrick (1970) demonatrated differences 
between school aggregate indicea baaed on matched longitudinal, 
unmatched longitudinal, and croaa*aectional data. Related reaulta in 
Dyer, et al. (1969) alao provided empirical support for this position. 

A related iaaue involvea the unit of analysis to use in the 
regression analyaia. Dyer, et al. (1969) found that the reaiduala 
from an individual level regreaaion analyais aggregated to the achool 
level were highly correlated (median r«.93) with the reaiduala from an 
analyaia involving achool meana. However, they alao found that the 
individual level analyaia produced SEIa which were alightly more 
stable. O'Connor (1972), however^ noted that the aggregation of the 
individual level reaiduala to the achool level producea aummary valuea 
which are correlated with both inputa and predicted outputa aince the 
individual level regreaaion coeff icienta do not provide the least 
squarea aolution to the problem of intereat. For theae reaaona and 
the empirical aupport provided by Frechtling (1962), regreaaion of 
school meana waa aelected aa the analytic atrategy for thia 
inveatigation • 

Aa noted above. Dyer et al • (1969) provided aome aaaeaament of 
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the stability of the SEIa generated from the four regression 
strategies they considered. The study involved 64 school systems 
(rather than schools) with standardized achi.eveiient test results for 
eighth graders in the 1960-61 school year being regressed onto the 
corresponding scores of these students when they were fifth graders. 
The sample from each school system was split into two random 
subsamples of equal size and, for the analysis involving school means, 
the intercorrelations between the pairs of SEIs ranged from .62 to .64 
depending on the subject area being tested. When these correlations 
were stepped-up using the Spearman-Brown formula to reflect the 
reliability of the composite (a more appropriate index of the 
stability for total sample; see O^'Connor, 1972) the coefficients 
ranged from .77 to .91. 

In a similar investigation Harco (1974) studied a sample of 
third grade Title I students enrolled in 70 elementary schools in the 
Midwest. Standardized achievement test scores were once again used as 
both input and output measures with spring posttest scores regressed 
onto fall pretest data. The subject area was reading and the 
reliability reported was .83 for the analysis involving school mean 
residuals. This compares favorably with the stepped-up results from 
Dyer, et al. (1969) which were .77, .80, and .86 on the vocabulary, 
reading and language subtests. Although these reliability 
coefficients are quite high, it is important to remember that the only 
factor allowed to vary was the sampling of students from a given 
school, grade, and year. Therefore, although they could possibly be 
used to justify computing SEIs for subsamples of students from large 
schools, they provide no evidence for the consistency of SEIs when 
these important factors are allowed to vary. 
Q Forsyth (1973) studied the stability issue as it relates to 
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two succaaaiva ciasaea in a particular school. Although differing 
slightly in some technical details from the studies cited above, the 
same basic approach of using standardized test results as both inputs 
and outputs and analyzing school means (n^SO) was employed. Outputs 
were the twelfth grade standardized test scores for two successive 
classes (graduating in 1^68 and 1969) and their test results as ninth 
graders (in 196S and 1966) were the inputs. For the nine subtest 
scores of the Iowa Tests of Educational Devftlopment (and the 
Composite) the correlations among the residuals for the two years 
ranged from .11 (Quantitative Thinking and Vocabulary) to .50 (Social 
Studies) with a median of .28. Forsyth considered the consistency of 
classifications in the five c^ctegory PI metric and noted that perfect 
agreement was rare (16 percent to 36 percent). He then argued that 
for many applications a difference of one category on the PI scale may 
be sufficiently consistent. Using this criterion, between 62 percent 
and 88 percent of the schools were ''consistently" categorized 
depending on the subtest under consideration. 

In a recent paper Helmstadter and Walton (1985) have presented 
correlations of SEIs across four elementary grades (third through 
sixth) and three subject matter areas (math, reading, and language). 
Based on regression analyses of school means, within grade 
correlations among the three subject area SEl£ were quite large 
(roughly between .7 and .9). Correlations of SEIs across grades 
within the same subject area were somewhat smaller, typically between 
•4 and .6. Although based on large samples (40,000 students per grade 
in 450 or more schools) the details of the regression analyses are 
unclear. It is unlikely, however, that the research was based on 
longitudinal data. 

In a study conducted by Matthews, Soder, Ramey and Sanders 
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(1981) uaing longitudinally matched data for studenta attending the 
Seattle Public Schools, the results were not so positive. Student 
level residuals using earlier achievement and various SES measures 
produced SEIs which were quite inconsistent across graaes (the grade 
span was the second to the eighth grade) , subject areas (reading and 
math), and years (1978-79 and 1979-80). The authors discussed but 
presented no specific results dealing with inconsistencies of positive 
and negative outliers as a function of subject and grade. As far as 
differences as a function of year are concerned, however, they noted 
that in some cases, a school was identified as a positive outlier one 
year and a negative outlier the next. Year-to-year correlations of 
SEIs computed at the same grade level ranged from -.24 to .44, none of 
which were statistically significant because of the small number of 
schools involved. 

Methods and Data Sources 
For a number of years, the state of South Carolina has had a 
policy of statewide testing of students in the majority of the grades 
in the K-12 grade span. Criterion referenced tests (CRT) used as a 
part of the Basic Skills Assessment Program (BSAP) are administered 
each spring to all students in grades 1,2,3,S,8, and 11. Students in 
grades 4, 7, and 10 are tested, also in the spring, with the 
Comprehensive Tests of Bas ic Skills (CTBS). In addition, the 
Cognitive Skills Assessment Battery (CSAB) is administered at the 
beginning of the first grade as a readiness test. 

The BSAP tests are relatively short and include reading, 
mathematics, and writing at the higher grade levels. The reading and 
mathematics subtests contain 36 and 30 multiple choice items 
respectively. Scale scores ere available for the BSAP tests. 
Q This study was limited to the 485 elementary schools in South 

ERIC 



stability of SEIs Page d 
Carolina which contain grades one through four (and possibly 
additional grade levels). Student records for the Spring 1965 testing 
were matched with the corresponding test records for the previous 
testing (Spring 1984) with one exception. The first grace BSAP 
records were matched with the corresponding (Fall 1984) CSAB records. 
Schools with fewer than 20 matched (and complete) student records at 
each of the four grade levels were eliminated from consideration 
reducing the number of schools to 423. In order to obtain stability 
data comparable to the data presented in the studies cited above» each 
school*grade sample was split into two random subsamples of equal 
size. BSAP scale scores in reading and mathematics (grades 1-3), and 
expanded sc^le scores for the Total Reading and Total Mathematics 
subtests of the CTBS (fourth grade) based on the Spring 1985 testing 
were used as the output variables for each of the four grade cohorts. 
The **year earlier** BSAP scale scores (for students in grades 2-4 in 
spring of 1985) or the the CSAB raw score (for 1985 first graders) 
were considered to be student input variables. Variables representing 
the percentage of children eligible for free lunches and the 
percentage eligible for reduced price lunches in 1985 were used as 
**hard*to*change** variables and students whose records indicated that 
they were handicapped were eliminated. Regression analyses of the 
school subsemple mean outputs onto the mean inputs and the two lunch 
P^rcenteges were conducted for each of the eight subsamples. Although 
not precisely in keeping with Dyer^s prescription, student ized 
residuals for reading and mathematics were used as the SEIs. 

Reliability coefficients reflecting the consistency of the 
within*grade subsemple SEIs were computed for purposes of comparison 
with the results cited above. Intraclass correlations (ri) were 
Q obtained to reflect the stability of the subsemple SEIs, and were 
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stability of SEIa Page 9 
»teppad-up (r22> to raflect the reliability of the results which might 
be expected for the total senple. Intraclass correlations were 
selected over the more common Pearson (interclass) correlations since 
they are more appropriate measures of consistency of the results for 
the randomly created subsamples. Because of the large sample size, 
the biased estimator was considered sufficient (see. Winer, 1971, p. 
287) . 

SEIs were then recomputed for the total sample. As a matrer 
of interest, these SEIs were correlated with the average of the two 
subsample SEIs in order to verify that the stepped-up stability 
coefficients were reasonable in reference to the results based on 
total samples. To address the main issues in this paper, correlations 
between the reeding end mathematics SEIs within a grade and among the 
four grade-specific sets of SEIs were obtained. If these results 
warranted further analysis, the SEIs were dichotomized in order to 
simulate the selection of schools for an award and the consistency of 
these decisions was considered using the Kappa coefficient. Finally, 
similar results were considered in terms of indices obtained by 
aggregating scross the two subject matter areas and the four grade 
levels. 

Results of Preliminary Analvsee 

A summary of the results of the regression analyses is 
presented in Table 1. As has been mentioned, these analyses were 

conducted for all schools in South Carolina with 20 or more useable 
metched records at the grede level under consideration. To clarify. 



Insert Table 1 about here 
Q schools with grades 1-3 were eligible for inclusion in the analyses 
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for those thraa gradaa but ware not xncludad in the final sample of 
423 schools. This final sasple of 423 schools containeo approKimately 
30,000 first graders with between 20 and 216 matched first-grade 
records per school; roughly 25,000 second graders with between 22 and 
152 per school; approx;imately 24,000 third graders with 22 to 140 per 
school; and about 24,500 fourth graders with between 21 and 153 per 
school • 

The results as presented in Table 1 are seen to be quite 
stable across su^samples but the multiple correlations are somewhat 
smaller than those reported by Dyer, et al. (1969). It is likely that 
the primary reason for this finding is that in the Oyer study the 
output measures were obtained from eighth graders, older students than 
the first through fourth graders considered here. The data in Table 1 
support the common finding that student (and therefore school) 
achievement is more accurately predicted for older than for younger 
students. A second explanation for these results might be use of the 
shorter CRTs at most grade levels. 

We also observe that achievement in reading across grade 
levels is predicted more precisely than achievement in mathematics, a 
result which tends to be consistent with studies cited above which 
dealt with children in the early grades (e.g., Webster and Olson, 
1984). This interesting finding suggests that more variation in the 
reading performance of young children can be accounted for in terms of 
factors such as readiness, previous achievement, and SES than is true 
of their mathematics performance. A likely causal variable would be 
the amount of preschool training, possibly at home, and probably 
concentrated on skills associated with reading. Thus it appears as 
though there exists more '*free" variation in mathematics than reading 
Q which suggests that schools could pocentially have more of an impact 
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in this baaic skills arsa. Two sonswhat curious findings are: (1) 
that ths psrcsntags of studsnts sligibls for raducsd price lunchss is 
a predictor of reading but not mathematics achievement, and (2> that 
previous mathematics performance is not a significant predictor (in 
the context of the other predictors) of second grade mathematics 
achievement. 



Insert Table 2 about here 
In Table 2 the results of correlating the subsample SEIs are 
presented. The results indicate that performance in mathematics 
(median steppjd^up reliability of .86) was somewhat more stable across 
subsamples than performance in reeding (median reliability of .76). 
The .78 value for reading compares favorably with the corresponding 
result obtained by Oyer, et al. (1969) and presented in stepped-up 
form as .80 by O^Connor (1972). The Oyer results, however, did not 
indicate that mathematics performance was more stable than reading 
performance as suggested in Table 2. 
Results of Primary Analyses 

The results presented above have characterized the consistency 
of results aa they pertain to the aa^pling variability of atudent 
performance within a given grade and aub^ect area. Next we will 
conaider the conaiatency of SEIa acroaa the two aub^ect areaa of 
reading and mathemetica but within grade level. In thia case, Pearson 
correlationa are appropriate and are reported in Table 3. The "Total" 
column in Table 3 refera to the correlations between the reading and 
math SEIa computed from the total aample. Correlations oetwcien these 
SEI and the average of the two aubaample SEIa were all larger than 
•98# indicating that r22 providea a reaaonable eatimate of the 
O stability of the SEIa baaed on the total aample. All correlations in 
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Table 3 are significant and of r^derate size, indicating that, within 



Insert Table 3 about here 



the same grade level, student performance in the two subject areas is 
reasonably consistent. Although these results are somewhat 
disquieting, the correlstions do not provide a clear picture of the 
inconsistencies which might arise if the objective were to identify 
'*exceptionsl'* schools bssed on SEZs for one of the two subject areas. 
For this purpose, the SEZs were dichotomized to simulate the 
identification of "exceptional" performance. That is, SEIs in excess 
of 1.0 (Dyer's criterion for PI'S was 1.5 but for many applications 
this would be too selective) were considered exceptional and 
percentages desling with decision consistency end coefficient Kappa 
were obtained. The results are reported in Table 4. 



Insert Tsble 4 sbout here 



The Kspps coefficients range from .52 for first grade to .33 
for fourth grade suggesting that decisions based on one or the other 
of these two importcint basic skills become less stable as children 
mature and develop. Since the standard mrrorm are very small for 
ssmples of this size, sll Kspps coefficients sre significsnt. 
However, the percentages of inconsistent clsssif icst ions provide clear 
evidence that two rather different sets of schools would be identified 
depending upon whether reading or mathematics were the one basic 
skills area selected. 

It is importsnt to note that the correlations and results on 
decision consistency sbove reflect the stsbility of performsnce of the 
ssme group of students and, therefore, do not reflect inconsistencies 
Q which msy be introduced if different grsdes sre considered. Tsble 5 

ERIC 



stability of SEIs Page 13 
contains the intarcorrelationa among grade-specific SEIs. These 



Insert Table 5 about here 



correlations are discouragingly small » the majority not achieving 
statistical significance at the .05 level. In reading^ in particular » 
there is essentially no relationship between the SEIs for the four 
grades with the exception that fourth grade SEIs are very moderately 
related to SEIs reflecting the performance of first and third graders. 
Although most of the correlations in the mathematics area are large 
enough to achieve signif icance^ this is little solace if they are 
considered as parallel forms reliability coefficients. Since the 
correlations were so smalls analyses based on decision consistency 
were considered unnecessary. 
Results ReoardinQ Aaareaation 

The results presented to this point suggest that SEIs based 
on reading and mathematics performance of the same student cohort are 
modestly consistent but that» when the SEIs of students at different 
grade levels are related, the results border on randomness. Although 
these findings suggest rather strongly that aggregation of such 
disparate SEIs will be a fruitless endeavor, for completeness, 
unweighted average SEIs wexe computed across the two dimensions of 
interest in this study. First, the average (AVE) of the reading and 
math SEIs were obtained at each grade level. The r22 indices of these 
SEIs were very similar to those relating to mathematics only (see 
Table 2) with the largest difference between the two sets of indices 
never exceeding ^02. This comparability apparently reflects a trade- 
off between an increase which might be expected for a more 
comprehensive index and the fact that the reading SEIs are less stable 
Q than those reflecting math performance. 
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Secondly, averagea across the four grades for each of the two 
subject areas and AVE were computed (referred to as composite scores). 
As might be suspected from the earlier results, this scheme did not 
produce the increases in stability we normally expect from 
aggregation. The stability across subject areas of the composite 
scores was .66 for the total sample, midway between the smallest and 
largest grade*specif ic values of .60 and .70 (see Table 3) and Kappa 
was .42 again representative of the values presented in Table 4. The 
r22 value associated with the composite reading SEX was .80, in the 
range of the grade*specif ic values presented in Table 2. The 
corresponding staoility coefficient for mathematics was .90 which is 
larger than the grede^specif ic coefficients which ranged from .64 to 
.87. The stability of the composite based on AVE, which corresponds 
to an unweighted aggregation across subject areas and grades, was .68 
again approximating the stability of the "mathematics only" composite, 
composite presented. 

Dlseussion and Educational Significance 
The approach used in this paper for computin SEIs is clearly 
not perfect. Arguments concerning the restricted nature of 
achievement test data and the limited coverage afforded by tests in 
only two subject areas are clearly valid. Furthermore, no attempt has 
been made to deal with issues of equity. (The authors acknowledge the 
importance of assessing school impact on all pupil subpopulations; 
equity issues were not dealt with in this paper for simplicity alone.) 
However, the use of a general approach which has been found to have 
merit by a number of researchers and apply it to large, longitudinally 
matched samples, appears to be unique. Furthermore, it seems 
reasonable to presume that in the elementary grades considered here, 
Q echievement in reading and mathematics should be priority areas for 
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all schools. The fact that the BSAP tests were developed based on 
statewide objectives in reading and mathematics lend further support 
for this viewpoint and suggests that they should be reasonably 
*'curriculu]i valid.'* These results cannot easily be discredited. 
What, then, are the implications for educational practice? 

First, the results should cause ** effective schools** 
researchers to rethink the concept of an effective school. The 
inconsistency of the results across grades strikes at the very heart 
of a model which posits school "main effects.** In the same vein, 
Matthews, et al . (1981), discussing the inconsistency of SEIs across 
two school years (different student cohorts) stated *'the low 
correlations obtained here indicate that high or low performance at a 
given grade level in a school may have more to do with the 
characteristics of that particular student cohort than with school 
effects.'* (p. 11). Apparently how well a given group of students 
achieve in a given subject in a given year, when achievement is gauged 
against potential, is only weakly related to similar measures for 
other cohorts. 

Secondly, the results suggest that school effects, at leasr at 
the early grades, may be more or less discernible depending upon the 
subject area considered. The majority of the inter-grade SEIs in 
mathematics^ although smalls were at least larger than chance whereas 
most of those for reading were not. A suggested explanation of this 
finding is that young children are more likely to gain knowledge and 
skills in areas such as reading from sources outside the school than 
IS true for areas such as mathematics. A strategy of identifying 
•ff^ctive schools based on mathematics achievement alone in order to 
achieve more steble SElm, although psychometrically rational, seems 
Q educationally unsound. 
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The rasulta create a serious problem for those charged with 
the identification of schools to receive incentive awards based on 
student achievment. In an attempt to assess schools in a 
comprehensive fashion » the proposed algorithms usually aggregate 
grade*subtest SEIs and use the composite index for purposes of avard 
decisions. This is a logically sensible and politically defensible 
approach. Psychometrxceily , however ^ it appears to be analogous to 
awarding scores to students who randomly responded to a number of test 
items in that "true score variance** does not seem to manifest itself. 

It is possible that the results simply reflect the different 
goals that school leaders set for themselves each year. Thus, a 
school might successfully impact on the mathematics performance of low 
achieving third and fourth graders as intended, but the matrix of SEIs 
would not demonstrate consistency. This problem appears to be the 
basis for Rowan^s (19dS> statement, **The best method of measuring 
school effectiveness is unknown.** (p. 99). For such a model, a 
school -specif ic weighting system would be needed if aggregation were 
to be meaningful. 

Common experience suggests that, there are effective and 
ineffective principals (end other school level staff members) who have 
an overall positive or negative affect on what happens in a school • 
Empirical support for this position, at least when effectiveness 
ia measured by residuals from a school level regression analysis, is 
another matter. 
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Tanle 1 



Significant Predictors and Squared Multiple Rs 
By Output Variable and Grade 



Grade 




Output 


Sianificant Predictors 


Subl 


R2 
Sub2 


Tot 


1 


533 


BSAP-R 


CSAB 


LUNCHF 


.45 


.46 


.48 






BSAP-N 


CSAB 


LUNCHF 


.30 


.33 


.34 


2 


519 


BSAP-R 


BSAP-R 


LUNCHF LUNCHR 


.64 


.65 


.68 






BSAP-M 


BSAP-R 


LUNCHF 


.44 


.43 


.46 


3 


523 


BSAP-R 


BSAP-R 


LUNCHF LUNCHR 


.63 


.62 


.66 






BSAP-H 


BSAP-R 


BSAP-H LUNCHF 


.36 


.29 


.34 


4 


508 


CTBS-R 


BSAP-R 


LUNCHF LUNCHR 


.72 


.74 


.76 






CTBS-M 


BSAP-R 


aSAP-M LUNCHF 


.47 




.50 



Note: To be included as a "significant predictor**, a regression 
coefficient was significant (p < .05) for all three 
analyses. This excluded only two cases in which a 
predictor was significant for one of the two aubsamples. 



Table 2 

Intraclass Correlations and Stepped-Up Reliabilities 
Measuring Consistency of Subsanple SEIs 
By Subject Area and Grade 





Reading 


Math 


Grade 


rr 


r99 


rx r-?-? 


1 


.76 


.86 


.77 .87 


2 


.56 


.71 


.73 .84 


3 


.63 


.77 


.76 .86 


4 


.6S 


.79 


.76 .86 


Median (1-4) 


.64 


.78 


.76 .86 



Note: Due to rounding, some of the results 
do not precisely agree with the 
Speamen-Brown formula. 
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Table 3 

Paaraon Correlations Bstwttsn Rsading «fid Mathsmatics SEI3 
For Each Subaampla and the ""otal Sampla 

^r^gf SuhaaMDla 1 Subaa«Dia 2 Total 

1 .65 .69 .70 

2 .49 .59 .60 

3 .55 .54 .60 



Tabla 4 

Dacision Conaistancy By Grada 
For Raading and Hathamatica SEIa 
For Total Sajipla 



Parcantagaa 



Grada 




♦ -/-♦ 


♦ ♦ 


Kaooa 


1 


79.7 


11.8 


8.5 


.52 


2 


78.3 


IS. 6 


6.1 


.53 


3 


80. 4 


12. S 


7.1 


.46 


4 


77.8 


16.3 


5.9 


.33 



Nota: A aign indicataa **axcaptional*' 

according to tha dafinition in tha taxt. 



Tabla 5 

Paaraon Corralationa Among Grada-Spacif ic SEIa 
By Subjact Araa 





RMdlng 






Hathanatica 


GradM 


Subl 


Sub2 


Total 


Subl 


Sub2 


Total 


1 & 2 


-.02 


.01 


.02 


.12* 


.15«« 




1 & 3 


.06 


.05 


.06 


.14* 


• .08 


.14«» 


1 & 4 


.13«« 


.06 


.11« 


.08 


.04 


.08 


2 & 3 


.09 


.00 


.07 


.08 


.17»« 


.17## 


2 & 4 


.06 


.03 


.04 


.06 


.10« 


.11* 


3 & 4 


.06 


.15*« 


.14«« 


.09 


.06 


.11» 


Median 


.06 


.04 


.06 


.09 




.13«* 


Mot*: • 


p < .05; 


• • p 


< .01. 









23 



